46 research outputs found

    Mining of patient data: towards better treatment strategies for depression

    Get PDF
    An intelligent system based on data-mining technologies that can be used to assist in the prevention and treatment of depression is described. The system integrates three different kinds of patient data as well as the data describing mental health of therapists and their interaction with the patients. The system allows for the different data to be analysed in a conjoint manner using both traditional data-mining techniques and tree-mining techniques. Interesting patterns can emerge in this way to explain various processes and dynamics involved in the onset, treatment and management of depression, and help practitioners develop better prevention and treatment strategies

    Alternative approach to tree-structured web log representation and mining

    Get PDF
    More recent approaches to web log data representation aim to capture the user navigational patterns with respect to the overall structure of the web site. One such representation is tree-structured log files which is the focus of this work. Most existing methods for analyzing such data are based on the use of frequent subtree mining techniques to extract frequent user activity and navigational paths. In this paper we evaluate the use of other standard data mining techniques enabled by a recently proposed structure preserving flat data representation for tree-structured data. The initially proposed framework was adjusted to better suit the web log mining task. Experimental evaluation is performed on two real world web log datasets and comparisons are made with an existing state-of-the art classifier for tree-structured data. The results show the great potential of the method in enabling the application of a wider range of data mining/analysis techniques to tree-structured web log data

    Human-like rule optimization for continuous domains

    Get PDF
    When using machine learning techniques for data mining purposes one of the main requirements is that the learned rule set is represented in a comprehensible form. Simpler rules are preferred as they are expected to perform better on unseen data. At the same time the rules should be specific enough so that the misclassification rate is kept to a minimum. In this paper we present a rule optimizing technique motivated by the psychological studies of human concept learning. The technique allows for reasoning to happen at both higher levels of abstraction and lower level of detail in order to optimize the rule set. Information stored at the higher level allows for optimizing processes such as rule splitting, merging and deleting, while the information stored at the lower level allows for determining the attribute relevance for a particular rule. The attributes detected as irrelevant can be removed and the ones previously detected as irrelevant can be reintroduced if necessary. The method is evaluated on the rules extracted from publicly available real world datasets using different classifiers, and the results demonstrate the effectiveness of the presented rule optimizing technique

    Using the symmetrical Tau criterion for feature selection decision tree and neural network learning

    Get PDF
    The data collected for various domain purposes usually contains some features irrelevant tothe concept being learned. The presence of these features interferes with the learning mechanism and as a result the predicted models tend to be more complex and less accurate. It is important to employ an effective feature selection strategy so that only the necessary and significant features will be used to learn the concept at hand. The Symmetrical Tau (t) [13] is a statistical-heuristic measure for the capability of an attribute in predicting the class of another attribute, and it has successfully been used as a feature selection criterion during decision tree construction. In this paper we aim to demonstrate some other ways of effectively using the t criterion to filter out the irrelevant features prior to learning (pre-pruning) and after the learning process (post-pruning). For the pre-pruning approach we perform two experiments, one where the irrelevant features are filtered out according to their t value, and one where we calculate the t criterion for Boolean combinations of features and use the highest t-valued combination. In the post-pruning approach we use the t criterion to prune a trained neural network and thereby obtain a more accurate and simple rule set. The experiments are performed on data characterized by continuous and categorical attributes and the effectiveness of the proposed techniques is demonstrated by comparing the derived knowledge models in terms of complexity and accuracy

    Thinking PubMed: an innovative system for mental health domain

    Get PDF
    Information regarding mental illness is dispersed over various resources but even within a specific resource, such as PubMed, it is difficult to link this information, to share it and find specific information when needed. Specific and targeted searches are very difficult with current search engines as they look for the specific string of letters within the text rather than its meaning.In this paper we present Thinking PubMed as a system that results from synergy of ontology and data mining technologies and performs intelligent information searches using the domain ontology. Furthermore, the Thinking PubMed analyzes and links the retrieved information, and extracts hidden patterns and knowledge using data mining algorithms. This is a new generation of information-seeking tool where the ontology and data-mining work in concert to increase the value of the available information

    Patient and business rules extraction and formalisation using SVN and SBVR for automated healthcare

    Get PDF
    This paper describes advances in automated health service selection and composition in the Ambient Assisted Living (AAL) domain. We apply a Service Value Network (SVN) approach to automatically match medical practice recommendations to health services based on sensor readings in a home care context. Medical practice recommendations are extracted from National Health and Medical Research Council (NHMRC) guidelines. Service networks are derived from Medicare Benefits Schedule (MBS) listings. Service provider rules are further formalised using Semantics of Business Vocabulary and Business Rules (SBVR), which allows business participants to identify and define machine-readable rules. We demonstrate our work by applying an SVN composition process to patient profiles in the context of Type 2 Diabetes Management.<br /

    Tree mining application to matching of hetereogeneous knowledge

    Get PDF
    Matching of heterogeneous knowledge sources is of increasing importance in areas such as scientific knowledge management, e-commerce, enterprise application integration, and many emerging Semantic Web applications. With the desire of knowledge sharing and reuse in these fields, it is common that the knowledge coming from different organizations from the same domain is to be matched. We propose a knowledge matching method based on our previously developed tree mining algorithms for extracting frequently occurring subtrees from a tree structured database such as XML. Using the method the common structure among the different representations can be automatically extracted. Our focus is on knowledge matching at the structural level and we use a set of example XML schema documents from the same domain to evaluate the method. We discuss some important issues that arise when applying tree mining algorithms for detection of common document structures. The experiments demonstrate the usefulness of the approach

    Conjoint data mining of structured and semi-structured data

    Get PDF
    With the knowledge management requirement growing, enterprises are becoming increasingly aware of the significance of interlinking business information across structured and semi-structured data sources. This problem has become more important with the growing amount of semi-structured data often found in XML repositories, web logs, biological databases, etc. Effectively creating links between semi-structured and structured data is a challenging and unresolved problem. Once an optimized method has been formulated, the process of data mining can be implemented in a conjoint manner. This paper investigates a way in which this challenging problem can be tackled. The proposed method is experimentally evaluated using a real world database and the effectiveness and the potential in discovering collective information is demonstrated

    Razor: Mining distance-constrained embedded subtrees

    Get PDF
    Our work is focused on the task of mining frequent subtrees from a database of rooted ordered labelled subtrees. Previously we have developed an efficient algorithm, MB3 [12], for mining frequent embedded subtrees from a database of rooted labeled and ordered subtrees. The efficiency comes from the utilization of a novel Embedding List representation for Tree Model Guided (TMG) candidate generation. As an extension the IMB3 [13] algorithm introduces the Level of Embedding constraint. In this study we extend our past work by developing an algorithm, Razor, for mining embedded subtrees where the distance of nodes relative to the root of the subtree needs to be considered. This notion of distance constrained embedded tree mining will have important applications in web information systems, conceptual model analysis and more sophisticated ontology matching. Domains representing their knowledge in a tree structured form may require this additional distance information as it commonly indicates the amount of specific knowledge stored about a particular concept within the hierarchy. The structure based approaches for schema matching commonly take the distance among the concept nodes within a sub-structure into account when evaluating the concept similarity across different schemas. We present an encoding strategy to efficiently enumerate candidate subtrees taking the distance of nodes relative to the root of the subtree into account. The algorithm is applied to both synthetic and real-world datasets, and the experimental results demonstrate the correctness and effectiveness of the proposed technique

    SEQUEST: Mining frequent subsequences using DMA strips

    Get PDF
    Sequential patterns exist in data such as DNA string databases, occurrences of recurrent illness, etc. In this study, we present an algorithm, SEQUEST, to mine frequent subsequences from sequential patterns. The challenges of mining a very large database of sequences is computationally expensive and require large memory space. SEQUEST uses a Direct Memory Access Strips (DMA-Strips) structure to efficiently generate candidate subsequences. DMA-Strips structure provides direct access to each item to be manipulated and thus is optimized for speed and space performance. In addition, the proposed technique uses a hybrid principle of frequency counting by the vertical join approach and candidate generation by structure guided method. The structure guided method is adapted from the TMG approach used for enumerating subtrees in our previous work [8]. Experiments utilizing very large databases of sequences which compare our technique with the existing technique, PLWAP [4], demonstrate the effectiveness of our proposed technique
    corecore